perm filename RAND[TLK,DBL] blob sn#198231 filedate 1976-01-25 generic text, type T, neo UTF8
.COMMENT !XGPCOMMANDS←"/PMAR=2200";
.DEVICE XGP
.FONT 1 "BASL30"
.FONT 2 "BASB30"
.FONT 4  "BASI30"
.FONT 5  "BDR40"
.FONT 6  "NGR25"
.FONT 7  "NGR20"
.FONT 8 "SUP"
.FONT 9 "SUB"
.TURN ON "↑α↓_π[]{"
.TURN ON "⊗" FOR "%"
.TURN ON "@" FOR "%"
.PAGE FRAME 54 HIGH 83 WIDE
.AREA TEXT LINES 1 TO 53
.COUNT PAGE PRINTING "1"
.TABBREAK
.ODDLEFTBORDER←EVENLEFTBORDER←1000
.AT "ffi" ⊂ IF THISFONT ≤ 4 THEN "≠"  ELSE "fαfαi" ⊃;
.AT "ffl" ⊂ IF THISFONT ≤ 4 THEN "α∞" ELSE "fαfαl" ⊃;
.AT "ff"  ⊂ IF THISFONT ≤ 4 THEN "≥"  ELSE "fαf" ⊃;
.AT "fi"  ⊂ IF THISFONT ≤ 4 THEN "α≡" ELSE "fαi" ⊃;
.AT "fl"  ⊂ IF THISFONT ≤ 4 THEN "∨"  ELSE "fαl" ⊃;
.SELECT 1
.MACRO B ⊂ BEGIN VERBATIM GROUP ⊃
.MACRO E ⊂ APART END ⊃
.MACRO FAD ⊂ FILL ADJUST DOUBLE SPACE PREFACE 2 ⊃
.MACRO FAS ⊂ FILL ADJUST SINGLE SPACE PREFACE 1 ⊃
.FAS
.COMPACT
.SELECT 1
.PORTION THESIS
.PAGE←0
.NEXT PAGE
.INDENT 0
.TURN OFF "{∞→}"   
.BEGIN CENTER 
⊗5↓_Automating the Discovery of Scientific Concepts_↓⊗*


Text for a seminar at RAND, Friday, Feb. 13, 1976, 10:00am


⊗2Douglas B. Lenat⊗*
Artificial Intelligence Lab
Stanford University




.END

When we  hear of  some highly-lauded  new discovery  in our field,  a
frequent reaction is  "Why didn't I think of that?  It's so obvious."
Even worse are the times we ⊗4did⊗* consider it, but dismissed  it as
not worth  pursuing.  Discoveries  are hard  to make, but  once done,
they  often  seem  obvious.   In  a  minute  I'll  give  one possible
explanation for that phenomenon, by proposing a simple  model for how
new concepts are formed in our minds.

In  case  you  were hoping  for  some  fascinating  new insight  into
cognition, you  may  be disappointed  to  learn that  the  model  I'm
talking about is the standard paradigm of ⊗4heuristic search⊗*.

A computer program has been developed, embodying that model, aimed at
formulating new mathematical theories. The fact that I'm here talking
to you should indicate that it has had some measure of success.


Before we consider how to create new theories, let's focus in  on how
we might  explain or account for  some particular discovery.   As our
first  example, consider  the concept  of prime  numbers.   How could
someone  rationally  have discovered  that  notion?  Can  we  somehow
analyze their discovery?

Suppose you  already know about  counting and arithmetic,  and you've
just been playing with the notion of "divisors-of" of a number.   One
rule of thumb that you probably follow  when devising new theories is
to consider extreme  cases.  In more formal terms, what we are saying
is that if  we've just  been studying an  interesting function  which
takes  elements of  set A  into those  of set  B, and  there is  some
extremal subset  of B, it's worth our time to consider which elements
from A map  into that extremal set.  Even more abstractly, one  could
say that  the inverse image of  an unusual set is often  unusual.  If
our sets  A  and B  are  the natural  numbers,  and the  function  is
"number-of-divisors-of", then an extremal set of numbers might be the
set (1,2) of  the smallest numbers, and their inverse image is simply
those numbers having at most 2 distinct divisors -- i.e., the primes.

So we have  reduced our task  from "how  could you possibly  discover
primeness"  to the  somewhat  simpler  task of  "how  could you  ever
discover  the concept of  divisors-of".   We have turned  a very hard
problem into  a hard problem.   We did  it by  citing a heuristic,  a
rule-of-thumb which is certainly not guaranteed to work in all cases,
but which is worth remembering and using because it frequently  leads
to worthwhile new concepts.

Getting back  to our  task here,  suppose we  knew how  to count  the
number of elements in a set, and how to take the cross-product of two
sets.  Then we  might display our  knowledge pictorially thus, and  a
quite natural question is "what is  this missing operation?".  We can
compute  it easily, at  least in  this direction. For  example, if we
consider the two numbers 7 and 8, then our first step  is to find two
sets  with  7  and   8  elements,  respectively.    Then  take  their
cross-product, then count how  many members there are.   We find  56.
Sure enough, this operation is multiplication,  and its inverse, this
relation, is divisors-of.

So we could draw a chain of reductions, from primes to divisors-of to
multiplication to counting and cross-product. Each step  is justified
by  some heuristic,  here  it  is "consider  extremals",  here it  is
"consider  the  inverse  of an  interesting  operation",  here  it is
"complete the square  diagram".  This  analysis could be carried  out
indefinitely,  but  after  a  while  the  concepts  involved  get  so
primitive that "discovering" them makes little sense.

Very often, a new discovery is really  just one step in this kind  of
process, just  one heuristic  application away  from what is  already
known.   That is  why it's  often easy for  us to believe  that there
wasn't really so  much effort needed  to make the  discovery.  If  we
already  know  this concept,  and  this  heuristic, it  almost  seems
obvious that someone would make this step.

Why  then don't we make discoveries  continuously, every few minutes?
Why are even these  1-step advances worth publishing? The  answer can
be found by stating explicitly how we think discoveries are made.

We're  talking about  reversing  the flow  of time  here.  Instead of
analyzing a discovery by breaking  it into simpler and simpler  ones,
we're now progressing in this direction, trying  to grow new nodes on
this tree, using whatever heuristics are available.

The researcher has a bunch of concepts that he knows about, say a few
thousand, and  a  bunch  of  heuristics he  can  apply,  say  several
hundred.   In any situation,  then, he  may literally have  a million
alternatives   to  consider,  all  of   them  plausible  avenues  for
investigation.  But nature is  not kind to us, and only a  very small
percentage  of those new  concepts will  turn out to  be interesting.
What is even worse,  you can't whiz through  this space, stopping  at
the interesting  ones, because  it may  require months  of effort  to
decide  whether  the  concept  you're  studying  is worthwhile  or  a
dead-end.

Let's take our  tiny math  example again,  where we have  just a  few
concepts and heuristics.  Even this tree becomes bushy when we try to
go  in this direction. We  could apply "looking  at inverse" here, to
decide  to  investigate  the  kind  of  operation  we  normally  call
projection.   Or  here: we  could get  bogged down  in the  morass of
examining numbers  which  have  very  many,  rather  than  very  few,
divisors. It's  not clear that  primes are  at all interesting  until
you've  investigated   them  thoroughly  enough  to  conjecture  some
relationships between them and other concepts.

We can sum up this argument by saying that synthesis of  new concepts
is a combinatorially explosive process,  while analysis is not.  That
is  why it  is  harder to  make new  discoveries than  to think  up a
plausible scenario for how a given concept was discovered.

I'm  saying that  theory  formation  can  be considered  a  heuristic
search:  That is,  a continual  expansion of a  tree of  concepts and
relationships, guided  by some  judgmental criteria,  some  heuristic
rules of thumb.

If this is true, one should be able to write a computer program which
proposes  new concepts in some  domain.  After  picking the domain in
which the system  will work, one must  determine the initial core  of
knowledge that  it will start  with, and collect  the heuristics that
experts in that  field use when  trying to form  theories.  Let's  go
through this development step by step.

I chose to have  my system try to do research  in mathematics, partly
because I have  some experience in that activity, and could therefore
extract the needed  heuristics by introspection, 
instead of having to elicit them from other people.
Also, math is  the
only  natural science  which doesn't  have to  contend with  errorful
laboratory  data.  There are some other  reasons, which I'll get into
later if you're interested.

So my  first step  was to  list all the  concepts I  would allow  the
system to know about. That is, a bunch of primitive notions which are
considered already-known by  the system.  This will  be the  starting
state of  knowledge.   But  how can  one decide  which concepts  to
include, and  which ones not to?   Three kinds of  sets come to mind.
The first is a ⊗4complete core of knowledge⊗*, 
where there are enough concepts  to
derive  all of  past and  future mathematics.  Well, that  is clearly
ill-defined, 
there's no way to be sure that your choice of basic concepts is complete.
At the other extreme I  considered starting the system off with a
⊗4minimal⊗* repertoire of concepts, meaning that  if you removed any one  of them,
it could never be re-derived from the remaining ones. This core was so small
it seemed almost meaningless; the truly fundamental ideas were so few that  the
system would be  working in a barren field  for a long time.  A third
kind  of starting set  came to  mind, one which  included just  those concepts
which young children  seem to  have, which Piaget  might have  called
⊗4pre-numerical⊗*.  These included  static  structural concepts  like
sets and relations, and active ones  like substitution, union, reversal, and
the composition of relations.
In all,  about  100 concepts  were  called for.  This  is the  set  I
actually  chose to let  the system  begin with.   Notice  that simply
executing some of the active concepts can produce new concepts.   Say
one of our starting concepts is  Compose-2-relations.  Then Composing
the  two  relations  Intersection  and  Complement,  we can  get  the
relation    Intersection(x,    Complement(y))    which    is     just
Set-difference(x,y).    If this  operation  weren't  in our  core  of
concepts, it could then be added.  Here are some of the concepts that
AM actually started with: <inital core slide>.

In addition to specifying  the initial concepts, we must  collect the
heuristics that the  system will be guided by. One  way to do this is
to imagine the system making  various discoveries, and deciding  what
rules it will need to  know.  Another, somewhat fairer way  is to get
experts  to introspect on all  the clues they use,  all the knowledge
that guides  their research efforts.   Unfortunately,  many of  these
heuristics  are  demon-like, not  even  occurring  to him  except  in
situations  where they might be used. So  it would be best to prepare
some exhaustive questionaire  for the experts, to  systematically put
them  in all  conceivable situations,  while they  are introspecting.
We'll return to how to do this in a minute.

Some of the heuristics suggest new concepts to look at (like "look at
the inverse").  Others gave hints for ⊗4when⊗*  to do things ("if you
can't find any examples of when a predicate is satisfied, then try to
generalize it"). Still others told ⊗4how⊗* to do something  ("to fill
in examples  of A, see what  operators have a range  which intersects
with  A, and try applying them").   Finally, there are heuristics for
filling  in  new  heuristics  (like  "after   using  the  `extremals'
heuristic to derive a new subset S of A, then add the heuristics that
` A is a  good set to  use when testing  conjectures involving f  and
f↑-↑1. ' ").

The final task, at this level of detail, was to decide precisely what
informaton  I would supply the  system about each  of these concepts.
Certainly we should provide the definition or some other definite way
of specifying each concept.  Several alternative names would be nice.
Maybe we'll want to give examples of each... no, the system should be
able to find  those. For  operations, we can  mention the doamin  and
range. Maybe we'll point to some more general or specialized concept.
Perhaps we know of some abstract representation in which this concept
can be quickly manipulated analogically (e.g.,  Venn diagrams for the
concept  of Sets).  Also,  to help the  system decide what  to do, it
would be nice to supply some  kind of rough judgment of the worth  of
this concept, its interest, its usefulness, its simplicity, etc.

So now we've  indicated the starting concepts  listed the heuristics.
How  do we actually  implement this as  a computer program?   What is
left to do? We must make precise our  representations and algorithms.
How is  a concept to be  represented? How are  the heuristics stored?
How are  they used?  What  is the  global  control structure  of  the
system?

Let's turn to the  heuristics. What requirements can we  put on them?
Well,  it  would  be nice  to  have  each one  come  to  the system's
attention only when  it plausibly  could be used.   Sometimes,  there
will  be  special information  that  just  pertains  to one  kind  of
concept.  For example,  some of the  heuristics tell  whether a given
composition is interesting or  not. There's no reason to  access them
unless we're  trying to judge a  composition. In fact,  by looking at
the heuristics, it seemed to  me that only a  few very weak ones  are
truly "global". ALmost every  single rule of thumb can  be identified
with a particular  concept (and all specializations of that concept).
It's almot as though the relevant heuristics were just  another facet
of each concept, like domain/range, examples, and definition.  We can
even  break  the  heuristics down  into  finer  categories,  for each
concept C, like  heuristics that deal  with checking a C,  heuristics
that deal  with creating a  new C, those  that deal with  judging how
interesting a particular instance of C is, and so on.

Hmmm. That would be a nice way to think of things, since it  provides
us with the framework we talked about to  pick the experts' brains, a
kind  of "questionaire" structuring.   We just  get them  to think of
heuristics  for each  particular  facet  of  each  concept  in  turn.
There's no reason  to put the resulting little  bundles of heuristics
together; just  leave them attached to the concepts' facets. Whenever
some concept C is the  center of attention, the only  heuristics that
the  system  need  worry about  are  those  attached  to C,  or  some
generalization of C, and probably only those attached to the relevant
facets of C that the system is working on.

But  what if  we're  thinking  about  performing an  operation,  like
Composing  two relations R1  and R2. Then  the heuristics  from all 3
concepts have  to work together.  How can  we arrange  that? Well,  a
reasonable Artificial Intelligence answer is:  give them all the same
representation, let them be simple cooperating knowledge sources. For
example,  they  could  all  be  predicate  calculus  statements,  and
combining  them would mean  conjoining them.  But that  would involve
logical manipulations to  conclude anything.   Or they  could all  be
productions,  and  combining  them would  mean  simply  dumping  them
together  into one production  system. I  like that image  better. We
still have  to decide  what the  left and  right hand  sides of  each
production rule will look like.  But let that go for now.

Let me go back to how to implement the concepts.  We may as well make
their  facets explicit, say  as properties on a  property list. Hmmm.
Some of  them  are  pretty trivial;  domain/range  should be  just  a
properly-formatted,  static   bit  of  data.    But   some  are  very
sophisticated; the algorithms part for  instance.  They will have  to
be procedural, little programs. Well that's all right.

What does the system actually do, now? It expands its knowledge, both
by  creating  new   concepts  and  by  finding  out  more  about  the
already-known ones.  That means  it creates new data structures  like
these, and it fills out the  properties of existing ones. In general,
the creation of a new one is directly suggested while filling in some
detail somewhere.  So the only real top-level activity  need be: pick
some facet of some concept, and try to find some new knowledge to put
in that slot.  During this process, some new concepts may be created.

What does it mean for a new one to be created?  Well, whoever created
it  should probably  know  its definition,  and  something about  its
worth. Probably also  how it connects to some existing concepts. Once
it's creatd, most of its facets will be empty, so there will  be many
new things for  the system to do. The space  of possible actions will
grow very rapidly.

For  those facets  which are  procedures, filling  them in  will mean
writing new little  programs. For example, there  should be a way  to
find an efficient algorithm  for doing multiplication once the system
discovered it.  Well, we  know enough about automatic programming  to
give the system those kinds of abilities.  The knowledge stored under
the  Algorithms concept, for  example, will  be automatic programming
techniques, which  can use,  say, the  definition of  any concept  to
produce an executable algorithm for it.

But there is an important gap in what I've told you, namely: how does
the  system decide what  to do  next!  This  was the same  problem we
discussed earlier, where there might be a million possible  things to
do at each moment.

The answer  is that the  system maintains  a job-list, a  sequence of
suggested  activities to  do, an ordered  list of  candidates for its
attention.   For example,  at one moment,  there might  be a  hundred
entries,  indicating that  the system  could Generalize  the relation
Object-equality somehow, check the examples of Compose that were just
filled  in, etc.    What  we now  have  to  decide is  precisely  how
candidates get entered onto the job-list, and how it gets ordered.

Perhaps the  most surprising design aspect of  the system is that the
primitive scheme  I implemented  on what  I thought  was a  temproary
basis works quite  adequately.  Each candidate job gets  entered by a
heuristic,  for example  this one is  suggested by  a heuristic which
recognized that  there  were too  few random  pairs  of objects  that
turned  out to  be  equal.   At  the moment  it  is entered,  whoever
suggested it  should know  some pretty  definite reasons  for why  it
might be worth  doing. So he  attaches a list  of reasons to  it, and
assigns  it a number from  0 to 1000.  If the job was  already in the
job-list, the  new  reasons  are  added, and  the  numeric  value  is
increased to  the square-root of the  sums of the squares  of old and
new numbers.

There  is a global  level of "tolerance",  which changes dynamically,
and no  job is  remembered if  its  value is  below this  threshhold.
Similarly, there  is a higher threshhold which  represents the lowest
number a job can have and actaully  be executed by the system. If  no
job has a high  enough rating, the system pauses and  spends a minute
or  two trying  to dredge  up new jobs.   All  the jobs  with ratings
between these  two  threshholds are  kept  around  just in  case  the
executing-threshhold decreases.

The assumption that  it makes sense to assign a  global value to each
candidate  is quite  dangerous, and  leads into a  very controversial
aspect of  my system: the  fact that  there has  to be a  rudimentary
calculus  of  interestingness.    For  example,  the  heuristic  that
suggested this generalization  had to  possess a  formula which  said
something like  "the numeric worth  of this  job (generalizing =)  is
estimated  to be  300 + 0.5*(value  of =)  + 10  * (the ratio  of the
number of things which turned out to be not-= to those which were =).
That  is,  we   are  assuming  that  there   is  an  easily-derivable
self-consistent  set  of  computational  rules  for  manipulating the
estimated worth  of various  concepts. Every concept  must have  some
numeric  description  of its  worth,  actually  a  vector of  numbers
describing it  along several  dimensions;  every heuristic  has  some
worth assigned to it, and has a formula for guessing the worth of any
job it suggests or new concept it proposes.

Predicate  calculus tries  to preserve  and manipulate  validity, and
differential calculus lets you make transformations while maintaining
equality, but this calculus tries only to preserve interestingness or
estimated worth.
Well, in any event, that was how AM was designed and created. You now know
all the essentials of its representations and algorithms. 

Step back for a moment and look over what's been done.
Notice how little of this whole discussion is specific to doing math research.
The design for the concept-grower can be used to realize
almost any kind of theory formation system for a hard empirical science.
The only change from one program to another would be the particular
starting concepts, the particular facets each concept could have, and
of course the individual heuristics would change. So the design of the
system could be the same. Perhaps if several such systems were created,
they could be run together, and cooperate.

There is one more assumption which you might have missed in all this, namely that
mathematics is an empirical science.
This is something which is hard to believe unless you've ever done some
math research. You quickly find that it has very little to do with the
smooth flowing proofs in textbooks. It's more like physics, where you
gather some data, look for regularities, and induce some conjectures.
But often they can't be phrased concisely with the existing
terminology. This is a clue that some new definitions should be made.
Or perhaps some concept is too rigid, too narrow to permit some conjecture
to hold. Then we can modify it, generalize or relax it in some way.
So if anything, math research proceeds in almost the opposite order from
the way it is finally reported in books and journals.


Now let's see what lessons we can learn from my system, that apply to
programs you might be working on.
The first observation is that if the heuristic operators are sophisticated
enough, then a simple numeric kind of calculus is all the meat-level
heuristics you need for adequate performance. And you don't need any
explicit meta-meta-heuristics at all.
It would be interesting to see if that is true in some sense for other
tasks as well.

So let me mention some of the problems I had programming the system.
The first
problem I had was naming it. I wanted to call it
SAM, for Semi-Automated-Mathematician, but someone beat me to that name.
Another suggestion was ACE, for A Cycle Eater, but I modestly decided to call
it the Automated Mathematician, AM for short.

A more serious problem
we all face is how to cope with changes in the system's knowledge
base. The standard solution is to track down all the effects of each change, and
update the whole data base. The solution AM uses, and perhaps people too, is to
just record the changed information, and ⊗4not⊗* to update everything.
When a contradiction of some kind occurs, then the conflicting knowledge
is checked, the knowledge ⊗4it⊗* was based on is checked, 
and so on, until we encounter
some changed opinion which explains the disagreement.
So the system is allowed to hold contradictory
viewpoints, so long as it could resolve any dispute if it had to.
This is one solution you might keep in mind for your knowledge-based systems.

Of course ⊗4the⊗* standard problem in our business is the tremendous gap
between English prose that sounds plausible, that will convince a human,
and LISP code that
will run on a PDP10. 
When talking to you, it is fine for me to say that
a Composition is interesting if
its domain and range sets coincide; 
but for AM, I have to analyze the affect numerically,
I have to specify precisely how to calculate the interest value  of the
resultant composition, say based on the interest values for the domain and
range sets, for the two constituent operations, for composition in general, etc.
The precision I need to instruct the machine still catches me unaware sometimes,
and I find myself underestimating the time I'll need to program some idea
I was sure would be a cinch.
This problem can be summed up as "Just because you can talk about something doesn't 
mean you understand it".  
There is no solution to this. It is one reason that we actually write these
programs.  
It implies that we can talk about more than
we really understand.
Which suggests to me that it's about time for me to stop talking.